D.9 MATLAB + Statistics Toolbox
Approximate Cost: $2,150 for an individual license of MATLAB®; $1,000 for an individual license of Statistics Toolbox
Source: MathWorks (www.mathworks.com)
Current Version: R2013a
Operating System Needs: Windows 7, Windows Vista, Windows XP, Mac OS X
Input Structure: Text files or Excel files are easiest, but any data format can be imported using a script.
Overview
MATLAB® is a software application that consists of a high-level programming language, interactive environment, and execution environment for data analysis, visualization, and numerical computation. The basic MATLAB software allows you to fit regression lines, calculate summary statistics, and plot data. MATLAB is flexible and can perform additional analyses using scripts and add-ins.
MATLAB also has a wide variety of visualization options including line plots, bar plots, histograms, pie charts, topological maps, and images. Many of these plots are available for both 2-D and 3-D plots and can be animated to show changes over time. More advanced users can generate a script to import, analyze, and plot data. Scripting is helpful when the same series of plots are generated on a regular basis (such as plots for quarterly reports).
With the Statistics Toolbox and some basic scripting, MATLAB can also be used for many other types of data analysis and visualizations. Additional types of plots available in the statistics package include box plots, probability distributions, additional types of histograms (including 3-D histogramsGraphical representation of frequency with data values grouped into specified numerical ranges (Unified Guidance). and scatter histograms), quantile-quantile plots, and multivariate analysis plots (including dendograms, biplots, and parallel coordinate charts). You can also perform hypothesis tests, analysis of variance (ANOVA)A statistical method for identifying differences among several population means or medians., cluster analysis, more complicated regression and classification, bootstrapping, confidence intervalStatistical interval designed to bound the true value of a population parameter such as the mean or an upper percentile (Unified Guidance). calculations, and data transformations.
With this additional functionality, MATLAB can generate customized plots of data to analyze distributions, compare or display data, and visualize temporal changes. For example, it is possible to compare two data sets to determine if concentrations changed over time, to determine appropriate backgroundNatural or baseline groundwater quality at a site that can be characterized by upgradient, historical, or sometimes cross-gradient water quality (Unified Guidance). levels, and to compare site data with background levels or cleanup goals. The Statistics Toolbox has a number of interactive applications for analysis of covariance, distribution fitting, density and distribution plots, contour plots, polynomial fitting, random number generation, regression diagnostics, robust regression, and response surface demonstration.
The capabilities designated in the table below for "Capability with Scripts/Add-ins" are based on available scripts.
|
Statistical Method |
Capability As Is (using MATLAB + Statistics Toolbox) |
Capability with Scripts/Add-Ins |
|---|---|---|
|
Handling of NDs |
|
|
|
● |
● |
|
|
|
● |
|
|
● |
● |
|
| ● |
● |
|
|
Exploratory/Diagnostic Tools |
|
|
|
Summary Statistics |
● |
● |
| ● |
● |
|
|
● |
● |
|
|
Data transformations |
● |
● |
|
Statistical Design |
|
|
|
Statistical Power |
● |
● |
|
|
● |
|
|
Contaminant ranking |
|
● |
|
|
● |
|
|
Statistical Limits |
|
|
| ● |
● |
|
| ● |
● |
|
| ● |
● |
|
|
Testing Compliance Limits |
● |
● |
|
Graphics |
|
|
|
Plots/Charts |
● |
● |
|
Batch plots |
● |
● |
|
Tweaking of graphics |
● |
● |
|
Statistical Comparisons |
|
|
| ● |
● |
|
| ● |
● |
|
|
Spatial Analysis |
|
|
|
Geostatistics/Mapping |
● |
● |
|
● |
● |
|
|
● |
● |
|
|
Regression/Time Series |
|
|
|
● |
● |
|
|
|
● |
|
|
● |
● |
|
|
● |
● |
|
|
|
● |
|
|
● |
● (with Econometrics Toolbox) |
|
|
Multivariate Analysis |
|
|
|
Multiple regression |
● |
● |
|
Factor/Discriminant analysis |
● |
● |
|
● |
● |
Capability Ratings:
N/A = Not applicable or not available
● = Full capability
◒ = Some capability
(blank cell) = No capability
Add-Ins Available
Add-ins relevant to groundwater statistics include Statistics Toolbox, Curve Fitting Toolbox (for fitting curves and surfaces to data as well as nonparametricStatistical test that does not depend on knowledge of the distribution of the sampled population (Unified Guidance). modeling techniques, such as splines, interpolation, and smoothing), and Neural Network Toolbox (for data-fitting, pattern recognition, and clustering). There are also many other add-ins not directly relevant to groundwater statistics.
Ease of Use and Data Import
Versions of MATLAB are available for a variety of Windows, Macintosh, and Linux platforms. Most basic tasks in MATLAB require use of a script or understanding of basic commands, making it moderately difficult to use. Advanced users can generate scripts, which have the potential to perform almost any desired task. MathWorks offers both introductory and advanced classes.
MATLAB is matrix-based and has the capacity to handle and manipulate large quantities of data rapidly. Many data or file types can be directly imported into MATLAB, including spreadsheet, text, or image files. Advanced users can write scripts to import data from any format. The Database Toolbox add-in allows you to read data directly from databases. Once imported, MATLAB is able to easily transform or perform other calculations on large data sets. Plots and image files from MATLAB can easily be saved as most common image types, written to Excel, or printed (either to a printer or to a pdf).
Types of Distributions
MATLAB accepts data of any distributional type. You can apply data transformations using a script or the Statistics Toolbox. The Statistics Toolbox includes functions and graphical tools to work with both parametricA statistical test that depends upon or assumes observations from a particular probability distribution or distributions (Unified Guidance). and nonparametric distributions, both continuous and discrete distributions, and both univariate and multivariate distributions. You can fit distributions to data, evaluate goodness of fit, generate statistical plots, generate probability density functions and cumulative distribution functions, and generate random and quasi-random number streams from distributions.
Visualization
MATLAB can be used to generate or display basic plots (including line, bar, and pie plots), topographic plots, images, and 3-D plots (including objects, volumes, and lines). With the added statistics package, MATLAB can also generate box plots, histograms, probability distributions, and quantile-quantile plots. You can customize plot styles as well, including lighting or camera angle on 3-D graphs. Series of plots can be animated to show temporal or geographical changes. While advanced users may use scripts to generate plots, other users can use a graphical user interface to perform most customization options.
Primary Uses for Groundwater Data Analysis
MATLAB can perform a wide variety of tasks related to data analysis, visualization, and numerical computation. MATLAB is particularly well-suited for computations in large data sets (including data transformations, hypothesis testing, background evaluations and regression/trend analysis). MATLAB can also generate customizable plots and images and can be an excellent tool for generating figures that must be updated on a regular basis, such as quarterly monitoring reports.
Benefits
- flexible and provides customization options to generate plots and images
- multi-purpose software that can be used for other applications
- capable of handling computations in large data sets
- able to evaluate background data to determine appropriate background levels and test for consistency of data with background
- includes scripts for easy updating of figures for quarterly reports
- variety of trend tests available
- able to handle parametric, nonparametric, continuous, discrete, univariate, and multivariate distributions
- can generate and customize plots using the graphical user interface and have MATLAB create the script associated with generating the plot
Limitations and Data Requirements
- cost
- moderate to difficult to use and requires some experience with basic programming.
- advanced use (scripting) required for many features
- limited functionality without the addition of the statistics toolbox
- challenging to import data that are not in spreadsheet, text, or image files
- limited customization of plots using the graphical user interface
Publication Date: December 2013